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Abstract: Finding the structure of a graphical model has been received 
much attention in many fields. Recently, it is reported that the non- 
Gaussianity of data enables us to identify the structure of a directed 
acyclic graph without any prior knowledge on the structure. In this pa- 
per, we propose a novel non-Gaussianity based algorithm for more gen- 
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the disjoint subsets of variables by iteratively evaluating the indepen- 
dence between the variable subset and the residuals when the remaining 
variables are regressed on those. However, its computational cost grows 
exponentially according to the number of variables. Therefore, we further 
discuss an efficient approximate approach for applying the algorithm to 
large sized graphs. We illustrate the algorithm with artificial and real- 
world datasets. 



1. Introduction 

In order to discover or understand data generating mechanisms, a graphical 
model has been used as a fundamental tool, and the problem of finding its structure 
from data has been received much attention in many fields including social sciences 
. PQ, bioinformatics [9] and neuroinformatics [8]. 

| Among variety of models, structural equation models (SEMs) and Bayesian net- 

. works (BNs) have been widely used to analyze causal relationships in empirical 

| studies [TJ [TU1 H3]- However, the full structure, i.e., a causal ordering and con- 

nection strengths, of the model cannot be identified in most cases without prior 
knowledge on the structure when only covariance structure of data is used for 
model estimation as is the case in almost conventional methods. Recently, it is re- 
' ported that non-Gaussian structure of data overcomes this identifiability problem in 

the case of linear directed acyclic graphs (DAGs) [TTJ [T^] . By their algorithms (the 
LiNGAM algorithms), if the external influences are non-Gaussian, the structure can 
be uniquely estimated by using observed data only without any prior knowledge 
(under an assumption of acyclicity). 

However, the applicability of the LiNGAM algorithms might be restricted in 
some real-world applications because of its relatively strong assumptions of linear 
acyclicity in each variable. For example, in the cases where an unobserved con- 
founder exists between exogenous variables or sink variables, a DAG structure is no 
longer appropriate to apply. Thus, it would be useful to develop a non-Gaussianity 
based framework to estimate the structure of more general class of models, such as 
chain graphs [7] , so as to deal with the situations under which the assumptions on 
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DAGs are not satisfied. Note that there is a method based on non-Gaussianity that 
takes unobserved confounders into account [4]. However, since this method needs 
to model unobserved variables explicitly, the computational cost is crucially high 
(In fact, only two or three variables were empirically treated in the paper). 

In this paper, we propose a non-Gaussian variant of chain graphs, which in- 
cludes the one of linear acyclic graphs as a special case, and present an algorithm 
for the estimation of this model. The algorithm finds an ordering of the subsets 
of variables by iteratively evaluating the independence between the variable subset 
and the residuals when the remaining variables are regressed on those. In addition 
to the applicability to chain graphs, it is empirically verified that the estimation 
by the proposed algorithm works reasonably well compared with the existing al- 
gorithms when applied to DAGs. However, this procedure needs to compute the 
independence exponentially many times corresponding to the number of variables. 
Therefore, we propose an approximate approach that can be performed without 
depending on the number of variables (although the accuracy may depend on) and 
can be applied to large scale graphs. The performance will be illustrated using 
artificial and real-world datasets. 

The reminder of this paper is organized as follows. In Sect. El we first intro- 
duce a linear non-Gaussian acyclic model for sets of variables (GroupLiNGAM 
model). Then in Sect. [31 we present an algorithm for (directly) estimating the 
GroupLiNGAM model. However, this approach would be inefficient for large sized 
graphs. Therefore, in Sect.21 we give an approximate approach that can be applied 
to large sized graphs based on the algorithm described in Sect. [3j The algorithms 
are illustrated and examined in its performance using artificial data in Sect. [5] and 
real-world data in Sect. [51 Finally, we give conclusions in Sect. 

2. GroupLiNGAM model 

In this paper, we consider a non- Gaussian variant of chain graphs, which we call 
the GroupLiNGAM model. 

Assume that observed data are generated from a process represented graphically 
by a chain graph on random variables x of dimension p. Let us express this chain 
graph by a p x p adjacency matrix B = where every fyj represents the 

connection strength from a variable Xj to another xi in the chain graph. Also, let 
K(l) (I — 1, ...,m, m < p) be ordered blocks, i.e., disjoint subsets of variables, 
so that no variables in later subsets influence any variable in earlier subsets and 
K(l) U • • • U K(m) — V, where V :— {1, . . . ,p} is the indices set of the variables!! 
The index of the subset, i.e., I, that Xi belongs to will be referred as l(i). Moreover, 
assume that the relations between variables in different subsets are linear. Without 
loss of generality, each observed variable Xi is assumed to have zero mean. 

Then, the GroupLiNGAM model is represented as 

( I ) Xi ^ y bij Xj -\- ei, 

l(j)<l(i),i^j 

where is an external influence. All external influences e^'s are non-Gaussian 
random variables with zero means and non-zeros variances, and independent of 



This definition is a generalization of the one of a DAG. That is, this is actually the definition 
of a DAG if all the subsets consist of one element, i.e., m = p. 
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(a) (b) 

Figure 1. Illustrative example of a chain graph. 



each other in different blocks. Alternatively, we write the model ([I]) in a matrix 
form: 

(2) x = Bx + e, 

where B can be permuted by simultaneous equal row and column permutations to 
be lower block-triangular due to the acyclicity of disjoint subsets in chain graphs 
P31 H]. Moreover, if we represent the model © as 

(3) x = Ae, 

the matrix A (:= (I - By 1 ) (called a mixing matrix) also becomes lower block- 
triangular (and with all unities in the diagonal). Note that, in the case of m = p, 
i.e., the DAG case, the model (J3|) defines the independent component analysis 
(ICA) model [5] since the components of e are independent and non-Gaussian. 
Since the ICA model is identifiable, the model ([3]) in this case (m = p) is also 
identifiable, which is the key idea of the original LiNGAM algorithm [TT] (we call 
it ICA-LiNGAM in the later part of this paper). 

Now, let us consider an illustrative example in which the model is represented 
by (cf. Figure Q] (a)) 

Xi = ei, 

x 2 = b 2 \xi + e 2 , 

(4) x 3 = 632X2 + e 3 , 

x 4 = 642X2 + 643X3 + e4, 
£5 = hlXi + 654X4 + e 5 , 

where unobserved confounders / and g exist between e± and e 2 and between and 
e5, respectively, as 

ei = cif + di and e 2 = c 2 / + d 2 , 

and 

e4 = C4<7 + c?4 and e§ — c$g + d$. 

d±, d 2 , c?4 and c?5 are independent of each other. Note that, in this case, the assump- 
tion for the LiNGAM algorithms, i.e., exogenous influences are independent of each 
other, is not satisfied. In fact, x\ and X4 depend on X2 and x§, respectively, because 
/ and g are not observed, and a DAG representation is no longer appropriate to 
apply. The ordered blocks for the example @ are K(l) = {1,2}, K{2) = {3} and 
K(3) = {4,5} (cf. Figure [T](b)). 
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3. Model estimation 

In this section, we address the estimation of the GroupLiNGAM model from 
data. In the following parts, we will refer the subset of variables corresponding to 
S C V as xs- Also, we denote by V \ S the complementary set of V with respect 
to S, and by x§ the subset of variables corresponding to V \ S. 

3.1. Identifying exogenous variables using non-Gaussianity. Recently, it 
has been reported that non-Gaussianity of external influences serves for directly 
estimating the ordering of variables from data [12] (DirectLiNGAM). The key in- 
sight herein is that, once an exogenous variable is identified, we can remove the 
component of the exogenous variable from the other variables without violating the 
original ordering for the residuals when the exogenous variable is regressed on the 
remaining variables. Here, we describe the analogous insight still holds for sets of 
variables. To this end, we first need the following assumption: 

Definition 1 (correlation- faithfulness) . The distribution of x is said to be correlation- 
faithful to the generating graph if correlation and conditional correlation of Xi are 
entailed by the graph structure, i.e., the zero s/non- zeros status ofbij, but not by 
specific parameter values of bij . 

This concept is motivated by the faithfulness [T3]. Also, we give the definition 
of the exogenous set of variables as follows. 

Definition 2 (exogenous set). Let the partition of the variables x be x = (xs,xg) 
such that xs and xg are not empty. Then, the subset of variables x$ is said to 
be exogenous against Xg, if the corresponding partition of the the matrix B has the 
following form: 

B s 



B = 



B S,S B £ 



Note that each variable in the exogenous set is not necessarily an exogenous 
variable. That is, the variables in an exogenous set may be influenced by each 
other inside of the set. Also note that the submatrix of the mixing matrix A 
corresponding to Bs is full-rank because the covariance matrix Ss of xs is also 
full-rank from the correlation-faithfulness assumption. 

Now, we give two lemmas and one corollary that is the basis of the algorithm 
proposed in this paper. 

Lemma 3. Assume that the input data x follows the GroupLiNGAM model 
and that the distribution of x is correlation-faithful to the generating graph. Let r^ s ) 
be the residual when is regressed on xs for S C V : r^ = — ^Tj^xs, 
where 



£ = 



is the covariance matrix of (xs^xg). Then, a set of variables Xs is exogenous if 
and only if xs is independent of its residual r^- s \ 



Proof First, assume that xs is exogenous. Then, one can write xs = A§ s A s 1 xs- 



where 



5 



is the coefficient matrix in Eq. <j3j> - From the definition of model §3$), e.g — Ageg 
and xg are mutually independent. Also, since = Ag^gAg Sg, AggAg 1 is 

equivalent to the regression coefficients when xg is regressed on x g. Therefore, 
r^ 5 ) is equivalent to eg . As a result, xg and r^) are mutually independent. 
Next, assume that xg is independent of r^ s \ Then, since xg is independent of eg, 
all elements of the regression coefficient matrix when xg is regressed on eg, i.e., 
A s g, become zeros, which means all elements of the upper right part of B, i.e., 
Bg g, are also zeros. From the correlation-faithfulness assumption and the defini- 
tion of exogeneous sets, xg is exogenous. ■ 

Lemma 4. Assume the assumptions of Lemma [3] and that a set of variables xg 
is exogenous. Let r^ s ' be the residual vector when xg is regressed on xg for 
S C V. Then, GroupLiNGAM models hold both for xg and , respectively : 
xg = BgXg + eg and = B^r^ + e^ s \ where Bg and B^ are matrices 
that can be permuted to be block lower-triangular by simultaneous row and column 
permutations, and elements of eg and e^ s ' are non-Gaussian and mutually inde- 
pendent in different blocks, respectively. 

Proof Without loss of generality, assume that B in the GroupLiNGAM model 
([2|) is already permuted to be lower block-triangular (which means A is also to be 
lower block-triangular with all unities in the diagonal). First, it is straightforward 
from Dcf. [2] that Bg is lower block-triangular. Next, since xg is exogenous, the 
regression coefficients when xg is regressed on xg becomes Ag gAg 1 . Therefore, 
removing the effects of xg from by least-squares estimation is equivalent to 
setting all elements of the first \S\ columns of A to be zeros. This means that 
the residuals are not influenced by xg because of the correlation-faithfulness 
assumption. As a result, we again obtain a lower block-triangular mixing matrix 
with all unities in the diagonal A^- s \= Ag) for r^ s K ■ 

Corollary 5. Assume the assumptions in Lemma\J^ Denote by ls(i) and l r ts)(i) 
the indices of the ordered subsets encoded by the chain graphs on xg and , 
respectively. Recall that l(i) denotes the index of the ordered subsets encoded by 
the chain graph on x. Then, the ordering of the subsets of Xg and are 
respectively equivalent to that of corresponding original subsets of variables, i.e., 
h(h) < h(i2) <=> 1(h) < 1(h) and l r ( S )(h) < l r (s)(h) & 1(h) < 1(h)- 

Proof As described in the proof of Lemma |H the adjacency matrices (and the 
mixing matrices) for the GroupLiNGAM models on xg and are equivalent to 
the corresponding parts of the one for the GroupLiNGAM model on x. This shows 
the orderings of xg and r( s ) are not changed. ■ 

Lemma [3] indicates that an exogenous set is identified by evaluating the inde- 
pendence between a set of variables xg and its residuals r^ s \ Lemma 0] implies 
that the GroupLiNGAM models for the p-dimensional vector xg and the (p — \S\)- 
dimensional residual vector can be handled as new input models, and Lemma[3] 
can be further applied to the each model to derive the next set of exogenous vari- 
ables. This process can be repeated until all subsets of variables are not able to be 
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Algorithm 1 GroupLiNGAM 

1: Given a p-dimensional variables x, a set of its subscripts V, apxn data matrix 

of the variables X, initialize an ordered subset of variables as K <— 0. 
2: Call K 4- GroupSearch (V, K, X). 

3: Construct a lower block-triangular matrix B by following the order in K, and 
estimate the connection strengths bij (using some conventional covariance-based 
regression, such as least-squares and maximum-likelihood approaches) on the 
original variables x and data matrix X. 

function K 4— GroupSearch (U , K, Xy) 
l: for S C U do 

2: Perform least-squares regression of x$ on imj (denote the residual vector 
by and its residual data matrix by R/ s ') and then compute some inde- 
pendence measure I(S) between xs and r^ S ', e.g., MI(xs,r^ S '). 

3: end for 

4: S* := argmin I(S). 

5: if I(S*) < S and \U\ ^ 1 then 

6: Set Xu\ s , ^R< s *). 

7: Call K <- GroupSearch (S*,K, X s , ) 

8: Call K 4- GroupSearch (U\S*,K, X[,\ St ) 

9: else 

10: Append S 1 * to the end of K. 
11: end if 



devided, and the resulting order of the sets of variable subscripts shows the causal 
order of the original observed variables according to Corollary [5] 

As the independence measure used in Lemma[3j the mutual information between 
the subset of variable and the residuals, i.e., MI(xs,r^ s '), would be available. 
There are many options for its estimation from data. In the later experiments, we 
used an algorithm based on the fc-nearest neighbors method [5] Q This method has 
one tuning parameters, i.e., the number of neighbors kneig. Although the setup of 
this parameter is not trivial, the algorithm is known to work well empirically when 
kneig is set as 3-5 % of the dimensional p [6] . 

3.2. GroupLiNGAM algorithm. Based on the above result, we now present an 
algorithm to estimate a block causal ordering and the connection strengths in the 
GroupLiNGAM model under the correlation-faithfulness assumption. The pseudo- 
code of the algorithm is shown in Alg. [TJ 

The algorithm is performed by the recursive calls of GroupSearch function, which 
devides a given subset U into ordered two groups. Since an exogenous set is iden- 
tified by evaluating the independence between a subset of U and its residuals from 
Lemma [31 we find such subset S*(c U) as the one that minimizes some indepen- 
dence measure I(S) (Lines 1-3 in GroupSearch in Alg. [T]). Thus, U is divided into 
ordered two groups 5* and U \ S*. From Lemma [4j for each of S* and U \ S*, 
the GroupLiNGAM models hold. Therefore, this procedure is iterated until fur- 
ther partition cannot be found, which is judged with a threshold S (Line 8-9 in 



4 We used the M ATLAB code available from http : / /www . klab . caltech . edu/~kraskov/MILCA/ 
in the experiments. 
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{1, /, 3, 4, 5} V (C/={1, 2, 3, 4, 5}) 



2nd call of GroupSearch n m ~ o-, 
(C/= {1,2,3J) > ' • ' ' 



5th call of GroupSearch 



3rd call of GroupSearch i /] 2} 
(£/={!, 2}) ' 




3 ; V(J/ ={4, 5}) 



|| 4th call of GroupSearch 



' V (U= {3}) 



Order 



Figure 2. Illustration of Alg. [T] for the example (gj. 

GroupSearch in Alg. [IJ. Finally-obtained order of variable subsets is consistent 
globally, which is guaranteed by Corollary [5] The illustration of this procedure for 
the example Eq. (j4} is shown in Fig. [2j 

Note that Alg. Q]is specialized to the DAG case if we set <5 = +00. However, 
the outputs by Alg. [1] and the DirectLiNGAM algorithm are not always same be- 
cause Alg. Q] finds the subset of variables that is exogenous against the remaining 
variables while the DirectLiNGAM algorithm identifies an exogenous variable iter- 
atively (that is, we can say that the former uses global information of independence 
between variables while the latter local one). Thus, the accumulation of errors of 
regression in Alg. [1] is expected to be no more than the one in the DirectLiNGAM 
algorithm, which will be illustrated empirically in Sect. [SJ 

4. Approximate approach for large graphs 

Since Alg. [1] needs to compute independence between x$ and exponen- 
tially many times (2l £/ l _1 , once for every S C f/jl at each iteration (Lines 1-3 in 
GroupSearch), it can be applied only to medium sized graphs (consisting of up to 
around 15 nodes). Here, we propose an approximate approach based on Alg. [T] 
applicable to larger sized graphs (L-GroupLiNGAM). 

The basic idea of the proposed algorithm is as follows. If we observe only a subset 
of variables, then some of the unobserved variables may act as confounders against 
some of the observed variables and, as a result, causal directions between such 
observed variables have become not identifiable. However, since the definition of our 
model permits confounders, i.e., makes edges undirected if there exist confounders 
between the observed variables, we can find the order of blocks which is identifiable 
from the currently observed variables using Alg. [TJ Therefore, by randomly picking 
up the subset of variables such that the set of the subsets covers all variables and 
applying Alg. [T]to each subset, we finally obtain a block ordering of all variables in 
a large graph. The validness of this procedure can be guaranteed by the following 
proposition: 

Proposition 6. Assume the assumptions in Lemma^4\ Let denote by (T <zV ) 
the ordering of the subsets of variables when only Xt is observed ( and other variables 
Xf are not observed). Then, the order ^t(*) is consistent with the one when all 
variables are observed, i.e., It^i) < ^t(*2) =^ ^(*i) < U*2)- 

Proof Assume that only xt is observed for T c V. If Zt(«i) < It^), then there 
exists a subset S C T exogenous against T \ S such that i\ 6 S and i-2 G T \ S . 



5 U = V (\V\ = p) at the first iteration. 
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Algorithm 2 L-GroupLiNGAM 

1: Given a p-dimensional variables x, a set of its subscripts V, apxn data matrix 
of the variables X and a cardinality h, initialize the list of orders between 
combinations of variables k = 0. 

2: Compute a random covering T(i) (i = 1, . . . , N) of variables with cardinarity 
h. 

3: for i = 1, . .., N do 

4: Apply Alg. [TJmodefied by replacing Line 1 in GroupSearch func. to flSJ with 

V <— ^(i), and add new orders from its output K to A;. 
5: end for 

6: Construct a block order K for all variables from k. 

7: Construct a strictly lower block-triangular matrix B by following the order in K, 
and estimate the connection strengths bij (using some conventional covariance- 
based regression, such as least-squares and maximum-likelihood approaches) on 
the original variables x and data matrix -X". 



Therefore, one can write x$ — A$es and x T \s — A^s^A^xs + ^T\se T \5, 
where x$ and A T \ s e T \ s are mutually independent. This means, if we denote 
as e s = J2ieS! a 3,i e i and ^t\s = Eies, a T\s,i e i, where Si C S U (V \ T) and 
^2 C (T \ S) U (V \ T), then the union of Si and S 2 is empty, i.e., Si n S 2 = 0. 
This means that all elements of the submatrix {(%■} (i € Si U S, j £ V \ (Si U S)) 
of the mixing matrix A are zeros and, as a result, l(ii) < ■ 

Based on the above result, we now present an algorithm for estimating the Grou- 
pLiNGAM model with large number of variables. The pseudo-code of the algorithm 
is shown in Alg.[5J where k is the list of combinations of variables (ji, j 2 ) with orders 
l(ji)<l(j 2 )- 

In the algorithm, we first generate a random covering of all variables T(i) (i = 
1, . . . , N) (Line 2 in Alg. [2]), i.e., subsets T(i) C V such that U i= i,... tN T(i) = V. 
And, we apply Alg. [T] to each T(i) (Lines 3-5 in Alg. [2]). Then, in order to reflect 
already-known orders (ji,j2) (ji, 32 G T(i)) when choosing S C U in Lines 1-3 in 
GroupSearch function in Alg. [U we replace Lines 1 in GroupSearch to the following: 

(5) for S C U s.t. 32 £ S -> ji $ U \ S for (ji , j 2 ) £ k do 

Also, the application of Alg. [1] may generate an output making a cycle as a whole 
when combined with previously-obtained orders due to statistical uncertainty of 
samples as the iterations of Lines 3-5 in Alg. [2] continue. In such a case, the 
inconsistent (old and new) orders need to be removed, i.e., we merge the ordered 
variables by these orders into a group. Finally, the ordering of variable subsets is 
constructed from the list of obtained orders. Although this procedure may not be 
able to find some of block orders, more and more ones are expected to be found 
depending on subsets T{i) as the iteration continue. 

5. Simulations 

In this section, we evaluate the proposed algorithms empirically using artificial 
datasets. Especially, we focus on (i) the evaluation of the validity of the proposed 
algorithms for estimating the GroupLiNGAM model (Alg. [T] and Alg. [2]) and (ii) 




Figure 3. Scatter-plots of the estimated fry by Alg. [T] (vertical 
axis) versus the generating values (horizontal axis) for combina- 
tions of dimensionality p = (5, 10) and the number of samples 
n = (500,1000). 



the comparison of estimation accuracy of the proposed and existing algorithms 
(ICA-LiNGAM pi] and DirectLiNGAM [12]) in DAG cases. 

First, for the purpose (i), we created datasets under each combination of number 
of variables p, sample size n and coverage cardinality h (for Alg. [5]), as followsQ 

(1) First, apxp block lower-triangular matrix B was randomly created so 
that the standard deviations of variables owing to their parents (deter- 
mined from its ordering of the subsets of variables) ranged in the interval 
[0.5, 1.5], where the number of blocks and the maximum number of parents 
of the created network for B were also randomly determined from 1 to p in 
uniform manner. The standard deviations of the external influences e were 
randomly selected from the interval [0.5, 1.5]. 

(2) Next, we generated data with sample size n by independently drawing the 
external influence variables e from various non-Gaussian distributions with 
zero mean and unit variance. This is performed by generating Gaussian 
variables Zi with zero means and unit variances, transformed it as = 
sign(zi)|zi|' 3i , where nonlinear exponents qi were randomly selected from 
the interval [0.5,0.8] U [1.2,2.0]0 and then standardizing e; to have zero 
means and unit variables. 

(3) The values of the observed variables x were generated according to the 
GroupLiNGAM model ([2]). And, the order of x is permuted randomly. 



^The way of creating datasets is the same as [12] except for that B is a block lower-triangular. 
^Nonlinear exponents qi with [0.5,0.8] and [1.2,2.0] give sub-Gaussian and super-Gaussian 
variables, respectively. 
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Figure 4. Scatter-plots of the estimated by Alg. [5] (vertical 
axis) versus the generating values (horizontal axis) for combina- 
tions of dimensionality p = (50, 100) and coverage cardinality 
h = (5,8) (n = 1000). 

The graphs in Fig. [3] and Fig. |4] show the scatter-plots of the elements of the 
estimated and generating adjacency matrix B (for randomly generated 10 datasets 
in the respective case). GroupLiNGAM (Alg. [[]) and L-GroupLiNGAM (Alg. [2]) 
were respectively applied to relatively small and large sized graphs (p~5, 10 (Fig. [3]) 
and p=50, 100 (Fig. [4}). The parameters S and kneig were set as 1.0 x 10~ 2 and 
0.05 x n, respectively. For Alg. [2] the number of subsets in a covering, i.e., N, 
was set as 50 in the experiment. Although the estimation seems to fail sometimes 
depending on dimensionality p, the number of samples n or coverage cardinality h, 
the estimation seems to work reasonably well. 

Next, for the purpose (ii), we created datasets as in the same manner with 
the procedure described in [llj . which is same with the above procedure except 
that each block contains only one element. As described above, GroupLiNGAM 
is specialized to the DAG case by setting S as +00 and thus, in this experiment, 
(5 was set as 1 x 10 6 for Alg. [TJ The graphs in Fig. [5] show the medians of the 
numbers of errors, i.e., the numbers of elements in the strictly upper triangular 
part when the true connection strength matrix B is permuted according to the 
estimated orders by the algorithms. The median errors by the algorithms are similar 
for almost experimental conditions and, hence, we can say that the estimation of 
GroupLiNGAM works reasonably well in the DAG case too. Here, we should note 
again that GroupLiNGAM can be applied not only to DAGs but also to chain 
graphs while the existing algorithms (ICA-LiNGAM and DirectLiNGAM) cannot. 

6. Application to real data 

To evaluate the applicability of the proposed algorithm (GroupLiNGAM), we 
analyzed a dataset taken from a sociological data repository on the Internet called 
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Figure 5. Median numbers of errors in estimated orders by the 
existing and proposed algorithms when applied to DAG cases (Left: 
p = 6 and Right: p = 10). 
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Figure 6. Status attainment model based on domain knowledge, 
where {1, 3, 6}<{5}<{4}<{2}. 

General Social Survey@ The data consisted of six observed variables, x\. fa- 
ther's occupation level, X2- son's income, x%: father's education, x^\ son's oc- 
cupation level, x§: son's education, x^: number of siblings. The sample size was 
1,380. Fig.|5]shows domain knowledge about their causal relations: K(1)={1, 3, 6}, 
K(2)={5}, K (3)={4} and K(4)={2}. In this section, we represent such relations 
by {1, 3, 6}<{5}<{4}<{2} to save space. Note that if {i, j}<{k}, Xi and Xj could 
directly and/or indirectly cause Xk, but not vice versa. 

In this experiment, Alg. Q] was applied since the number of variables is small. 
We tested several numbers of nearest neighbors kneig — 40, 50, 60, 70 to compute 
mutual information using the fc-nearest neighbor approach \jo\ for GroupLiNGAM. 
The estimated networks were not sensitive to the choice of the number of nearest 
neighbors kneig, and essentially the same results were obtained for the values of 
kneig. We show the results under kneig=50 in Tab. [TJ where a smaller threshold 
value for independence S gives a network between larger groups of variables. 

We first analyzed all the six variables. The estimated orders by ICA-LiNGAM 
[IT] . Direct-LiNGAM [H] and GroupLiNGAM are shown at the second top of 
Tab. [1] Those estimated orders are difficult to interpret since son's income (22) 
and/or son's education (25) could cause father's vari- 
ables (xi, X3), but not vice versa. The orders are not reasonable to their temporal 
orderings. 



"http : //www . norc . org/GSS+Website/ 



12 



Domain knowledge: 


{1,3,6}<{5}<{4}<{2} 


All the six variables analyzed. 


ICA-LiNGAM: 


{5}<{6}<{3}<{1}<{4}<{2} 


DirectLiNGAM: 


{6}<{2}<{1}<{3}<{4}<{5} 


GroupLiNGAM: 




5=0.500 


{6}<{2}<{1}<{4}<{5}<{3} 


(5=0.100 


{6}<{2}<{1}<{4,5}<{3} 


(5=0.010 


{6}<{2}<{1,3,4,5} 


(5=0.001 


{1,2,3,4,5,6} 


X2 omitted. 




ICA-LiNGAM: 


{5}<{6}<{3}<{1}<{4} 


DirectLiNGAM: 


{6}<{1}<{3}<{4}<{5} 


GroupLiNGAM: 




(5=0.500 


{6}<{1}<{3}<{5}<{4} 


(5=0.100 


{6}<{1,3}<{5}<{4} 


(5=0.050 


{6}<{1,3}<{4,5} 


(5=0.010 


{6}<{1,3,4,5} 


(5=0.001 


{1,3,4,5,6} 


X2 and £6 omitted. 




ICA-LiNGAM: 


{5}<{3}<{1}<{4} 


DirectLiNGAM: 


{1}<{3}<{4}<{5} 


GroupLiNGAM: 




(5=0.50 


{3}<{1}<{5}<{4} 


(5=0.10 


{1,3}<{5}<{4} 


5=0.05 


{1,3}<{4,5} 


5=0.01 


{1,3,4,5} 



Table 1. Estimated orders of groups. 



Next, we omitted son's income (x%) and analyzed the other five variables. Omit- 
ting X2 would not create any unobserved confounder since it does not cause any 
other variables according to the domain knowledge. The results are shown in the 
third top of Tab. [Q DirectLiNGAM and GroupLiNGAM found consistent time 
orderings between father's variables (2:1,2:3) and son's variables (£4,2:5). Further- 
more, GroupLiNGAM found a reasonable ordering between son's variables, i.e., 
son's education (2:5) could cause son's occupation level (£4), but not vice versa, 
whereas DirectLiNGAM failed. However, number of siblings (xq) is the top vari- 
able in every estimated ordering by DirectLiNGAM and GroupLiNGAM and could 
cause father's variables (2:1,2:3), which is not easy to interpret. 

We further omitted number of siblings (xg) as well as son's income (2:2) and 
analyzed the other four variables (2:1,2:3,2:4,2:5). Omitting x$ could create an unob- 
served confounder since it could relate father's variables (2:1,2:3) and son's variables 
(2:4,2:5). The bottom of Tab. [T] shows the results. Every ordering estimated by 
GroupLiNGAM is consistent with the domain knowledge. ICA-LiNGAM wrongly 
estimated that son's education (2:5) could cause father's variables (2:1,2:3), but not 
vice versa. DirectLiNGAM also gave inconsistent orderings between father's educa- 
tion (2:3) and father's occupation (2:1) and between son's education (2:5) and son's 
occupation (2:4). 



13 



In summary, GroupLiNGAM provided more consistent orderings with the do- 
main knowledge than ICA-LiNGAM and DirectLiNGAM. The reason would be that 
only GroupLiNGAM is able to allow unobserved confounders. However, it is not 
yet very clear why the inclusion of X2 and xq makes the results difficult to interpret. 
One possibility is that xi and xq might not fit well some assumption in the three 
discovery methods, e.g., linearity, compared to the other four variables. 

7. Conclusions 

In this paper, we proposed the GroupLiNGAM model, a non-Gaussian variant 
of chain graphs, and presented an algorithm for estimating this model, which is 
identifiable without any prior knowledge on the structure. Based on the result that 
an exogeneous set is identified by evaluating the independence between a variable 
subset and the residuals when the remaining variables are regressed on those, the 
proposed algorithm finds an ordered devision of variables iteratively and identifies 
an ordering of disjoint subsets of variables. However, since the computational cost 
grows exponentially according to the number of variables, a middle sized graph is 
the practical limit of this algorithm. Therefore, in addition, we presented an approx- 
imate approach to apply this framework to large sized graphs. In the experimental 
parts, we evaluated the algorithms empirically and illustrated the applicability us- 
ing artificial and real datasets. 

The algorithm has a tuning parameter 5, which determines when the devision 
of groups should be stopped (kneig is also an tuning parameter in the current 
implementation. However, this parameter is for the estimation of mutual infor- 
mation by the A;-nearest neighbor method [6] and thus would not be an essential 
parameter in our method). For more exact devision of groups, it would be useful 
to combine our framework with some statistical test method, such as the bootstrap 
method [3], in the future. Also, in the current implementation, exponentially large 
number of subsets need to be examined when identifying an exogenous (Linel-3 in 
GroupSearch in Alg.[lJ. Therefore, it would be important to develop more efficient 
search strategy for this part using some discrete structure. 
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